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ABSTRACT 

This paper presents a novel method for the protection of 
copyrighted multimedia content. The problem of selective 
encryption (SE) is being addressed along with the compres- 
sion for the state of the art video codec H.264/AVC. SE is 
performed in the context-based adaptive binary arithmetic 
coding (CABAC) module of video codec. For this purpose, 
CABAC is converted to an encryption cipher. It has been 
achieved through scrambling of equal length bin strings. In 
our scheme, CABAC engine serves the purpose of encryption 
cipher without affecting the coding efficiency of H.264/AVC 
by keeping exactly the same bit rate, generating completely 
compliant bitstream and requires negligible computational 
power. Owing to no escalation in bit rate, our encryption al- 
gorithm is better suited for real-time multimedia streaming. 
it is perfect for playback on handheld devices because of neg- 
ligible increase in processing power. Nine different bench- 
mark video sequences containing different combinations of 
motion, texture and objects are used for experimental evalu- 
ation of the proposed algorithm. 


1. INTRODUCTION 


With the rapid evolution of digital media, growth of pro- 
cessing power and availability of network bandwidth, several 
multimedia applications have emerged in the recent past. As 
digital data can be easily copied and modified, concerns re- 
garding its protection and authentication have surfaced. Data 
encryption is used to restrict access of digital data to only 
authenticated users. For huge video data with real time con- 
straints, SE is used in which only a small part of the whole 
bitstream is encrypted [13]. In this work, we have trans- 
formed CABAC module of H.264/AVC into encryption ci- 
pher. We have achieved this by scrambling of part of Exp- 
Golomb suffix of non-zero coefficients (NZs) and sign bits 
of all NZs. 

SE of H.264/AVC has been studied in [6] wherein en- 
cryption has been carried out in some fields like intra- 
prediction mode, residual data, inter-prediction mode and 
motion vectors. Encryption for H.264/AVC has been dis- 
cussed in [2] in which they do permutations of the pixels of 
macro-blocks (MBs) which are in region of interest (ROI). 
The drawback of this scheme is that bit rate increases as the 
size of ROI increases. This is due to change in the statistics 
of ROI as it is no more a slow varying region which is the ba- 
sic assumption for video signals. The use of general entropy 
coder as encryption cipher has been studied in the literature 
in [15]. It encrypts NZs by using different Huffman tables for 
each input symbols. The tables, as well as the order in which 
they are used, are kept secret. This technique is vulnerable 
to known plaintext attacks as explained in [4]. Key-based 


interval splitting of arithmetic coding (KSAC) has used an 
approach [5] wherein intervals are partitioned in each itera- 
tion of arithmetic coding. Secret key is used to decide how 
the interval will be partitioned. Number of sub intervals in 
which an interval is divided should be kept small as it in- 
creases the bit rate of bitstream. Randomized arithmetic cod- 
ing [3] is aimed at arithmetic coding but instead of partition- 
ing of intervals like in KSAC, secret key is used to scramble 
the order of intervals. Encrypted bitstream (EB) compliance 
is a required feature for multimedia applications and both 
of these techniques make the bitstream non-compliant and 
hence, can not be decoded by standard H.264/AVC decoder. 
We have already presented a SE scheme of H.264/AVC based 
on context-based adaptive variable length coding (CAVLC) 
which fulfill real-time constraints by keeping the same bit 
rate and by generating completely compliant bitstream [12]. 

The rest of the paper is organized as follows. In Sec- 
tion 2, overview of H.264/AVC and CABAC is presented. It 
explains the working of CABAC along with its limitations 
from encryption point of view. We explain the whole system 
architecture in Section 3. Section 4 contains its experimen- 
tal evaluation and performance analysis including its analysis 
over the wide range of QP values and its efficiency for dif- 
ferent benchmark video sequences. In Section 5, we present 
the concluding remarks about the proposed scheme. 


2. PRELIMINARIES 
2.1 Overview of H.264/AVC 


H.264/AVC [1] is the state of the art video coding standard 
of ITU-T and ISO/IEC. It offers better compression as 
compared to previous video standards. Like previous video 
standards, an input video frame is processed into blocks of 
16x16 pixels, called macroblock (MB) and each of them 
is encoded separately. Each MB can be encoded as intra 
or inter. In intra frame, current MB is predicted spatially 
from MBs which have been previously encoded, decoded 
and reconstructed (neighboring MBs at top and left). In 
inter mode, motion compensated prediction is done from 
previous frames. The difference between original and 
predicted frame is call a residual. This residual is coded 
using transform coding followed by quantization and zigzag 
scan. In the last step, either of the entropy coding techniques 
namely CAVLC or CABAC is used. On the decoding 
side, compressed bitstream is decoded by entropy decoding 
module, followed by inverse-zigzag scan. These coefficients 
are then rescaled and inverse transformed to get the residual 
signal which is added to the predicted signal to reconstruct 
the original signal back. 


H.264/AVC has some additional features as compared 
to previous video standards. Standard DCT transform has 
been replaced by integer transform (IT) [7] which does not 
need any multiplication operation and can be implemented 
by only additions and shifts. It can be implemented on 16 
bit integer arithmetic and hence, has removed the problem of 
mismatch among codec implementations for different pro- 
cessor architectures. In baseline profile of H.264/AVC, It 
has 4x4 transform in contrast to 8x8 transform of previous 
standards. In higher profiles, it offers transform coding of 
adaptive size. It uses a uniform scalar quantization. For in- 
ter frame, H.264/AVC supports variable block size motion 
estimation namely 16x16, 16x8, 8x16 and 8x8 and the latter 
8x8 block can be further divided up to 4x4 block size. Quar- 
ter pixel motion estimation, multiple reference frames, im- 
proved skipped and direct motion inference have also been 
included in this video codec. For intra frame, prediction 
has been shifted to spatial domain. Owing to all these addi- 
tional features, H.264/AVC outperforms previous video cod- 
ing standards [14]. 


2.2 Context-based Adaptive Binary Arithmetic Coding 


In entropy coding, quantized transformed coefficients are 
scanned in reverse order (from high frequency to low fre- 
quency) as shown in Fig. 1. In this paper, we are presenting 
an encryption scheme based on CABAC [8]. CABAC is de- 
signed to better exploit the characteristics of NZs, consumes 
more processing and offers about 10% better compression 
than CAVLC on average [9]. Run-length coding has been 
replaced by significant map (SM) coding which specifies the 
position of NZs in the 4x4 block. Binary arithmetic coding 
module (BAC) of CABAC uses many context models to 
encode NZs and context model for a specific NZ depends 
on the number of NZs which have been already coded in the 
current block. 


EB compliance is a required feature for some direct oper- 
ations (displaying, time seeking, cutting, etc.). In each MB, 
header information is encoded first, which is followed by the 
encoding of MB data. EB is compliant if the following three 
conditions are fulfilled: 


e To keep the bit rate of EB same as the original bitstream, 
encrypted bin string must have the same size as the orig- 
inal bin string. 

e The encrypted bin string must be a valid codeword so that 
it may be decoded by entropy decoder. 

e The decoded value of syntax element from encrypted bin 
string must fall in the valid range for that syntax element. 
Any syntax element which is predicted from neighboring 
MBs should not be encrypted. Otherwise the drift in the 
value of syntax element will keep on increasing and after 
few iterations, value of syntax element will fall outside 
the valid range and bitstream will be no more decodable. 


To keep the bitstream compliant, we cannot encrypt 
MB header data, since it is used for prediction of future 
MBs. MB data contains NZs and can be encrypted. A 
MB is further divided into 16 blocks of 4x4 pixels to be 
processed by IT module. The coded block pattern (CBP) 
is a syntax element used to indicate which 8x8 blocks 
within a macroblock contain NZs. The macroblock mode 
(MBmode) is used to indicate whether a MB is skipped 
or not. If MB is not skipped, then MBmode indicates the 


Figure 1: Scanning order of NZs in CABAC. 
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Figure 2: Block diagram of CABAC of H.264/AVC. 
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prediction method for a specific MB. For a 4x4 block inside 
MB, if CBP and MBmode are set, it indicates that this 
block is encoded.Inside 4x4 block, the coded block flag 
(CBF) is the syntax element used to indicate whether it 
contains NZs or not. CBF is encoded first. If CBF is zero, 
no further data is transmitted; otherwise, it is followed by 
encoding of SM. Finally, the absolute value of each NZ 
and its sign are encoded. Similar to MB header, header 
information of 4x4 block which includes CBF and SM, 
should not be encrypted for the sake of bitstream compliance. 


CABAC consists of multiple stages as shown in Fig. 2. 
First of all, binarization is done in which, non-binary syntax 
element is converted to binary codeword called bin string 
which are more amenable to compression by BAC. Binary 
representation for a non-binary syntax element is done in 
such a way that it is close to minimum redundancy code. 
In CABAC, there are four basic code trees for binarization 
step, namely the unary code, the truncated unary code (TU), 
the kth order Exp-Golomb code (EGk) and the fixed length 
code (FL) as shown in Fig. 3. 


For an unsigned integer value x > O, the unary code con- 
sists of x 1’s plus a terminating O bit . TU is only defined 
for x with O <x <-s. For x < 5s the code is given by the 
unary code, whereas for x = s the terminating 0 bit is ne- 
glected. EGk is constructed by a concatenation of a prefix 
and a suffix codeword and is suitable for binarization of syn- 
tax elements that represent prediction residuals. For a given 
unsigned integer value x > 0, the prefix part of the EGk code- 
word consists of a unary code corresponding to the length 
l(a) = [log2(4¢ +1)]. The EGk suffix part is computed as 
the binary representation of x + 2*(1 — 2!)) using k + 1(x) 
significant bits. Consequently for EGk binarization, the num- 
ber of symbols have the code length of 2/(x)+k+1. When 
k=0, 21(x) +k+1=21(x)+1. 

FL is applied to syntax elements with a nearly uniform 


distribution or to syntax elements, for which each bit in the 
FL bin string represents a specific coding decision e.g., CBF. 

Three syntax elements are binarized by concatenation of 
these four trees, namely CBP, NZ and the motion vector dif- 
ference (MVD). Binarization of absolute level of NZs is done 
by concatenation of TU and EGO (UEGO). TU constitutes 
the prefix part with cutoff value S = 14. Binarization and 
subsequent arithmetic coding process is applied to the syn- 
tax element coef f _abs_value_minus| = abs_level — 1, since 
quantized transformed coefficients with zero magnitude are 
encoded using SM. 

For MVD, bin string is constructed by concatenation of 
TU and EG3 (UEG3). TU constitutes the prefix part with 
cutoff value $ = 9. Suffix part of MVDs contains EG3 of 
|MV D| — 9 for |MVD| > 9 and sign bit. 

Among all the four binarization techniques, The unary 
and TU codewords have different codeword length for each 
input value. They do not fulfill the first condition and their 
scrambling will change the bit rate of bitstream. Bit rate is 
very important factor for multimedia streaming applications 
over Internet and increase in bit rate affects the performance 
of such applications. Suffix of EGk and FL can be scram- 
bled while keeping the bit rate unchanged. EGk is used for 
binarization of absolute value of levels and MVDs. Number 
of MVD bin strings have the same length. So it fulfills sec- 
ond condition and can be scrambled. But owing to the fact 
that MVDs are part of MB header and are used for predic- 
tion of future motion vectors, their encryption does not ful- 
fill third condition and their encryption makes the bitstream 
non-compliant. The syntax elements which fulfill the criteria 
for encryption of H.264/AVC compliant bitstream are suf- 
fix of EGO and sign bits of levels. Hence for each NZ with 
|NZ| > 14, encryption is done by scrambling of /(x) bits to 
encrypt the EGO. It is followed by encryption of syntax el- 
ement coeff_sign_flag which represents sign of levels of all 
non-zero levels. FL is used for binarization of syntax ele- 
ments which belong to MB header and cannot be encrypted. 


3. SYSTEM ARCHITECTURE 


To keep the bit rate intact, we scramble the NZ with only 
those NZs whose EGO bin strings have the same length. We 
initialize pseudo-random number generator (PRNG) with a 
secret key. EGO codes, having same code length, consti- 
tute the scrambling space and it is dependent on the absolute 
value of NZ. The permutation space is Jog2(n +1) where n 
is the suffix part of absolute value of NZ. The block diagram 
of our scheme is shown in Fig. 4. 


3.1 Encryption Process 


The encryption process is shown in block diagram in Fig. 5. 
Let x be a suffix part of absolute level of NZ which is en- 
coded using EGO and is to be encrypted with the encrypted 
coefficients y can be given by: 


y =(£+7Y) mod logs(x+ 1), (1) 


where Y is given by: 


y = rand() mod loga(x+ 1). (2) 
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Figure 3: Block diagram of binarization stage illustrating dif- 
ferent binarizatin methods. 
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Figure 4: Block diagram of encryption and decryption pro- 
cess in H.264/AVC. 


3.2 Decryption Process 


For the decryption of NZ in H.264/AVC decoder, the pro- 
cess can be performed in reverse order in context-based bi- 
nary arithmetic decoder (CABAD) module of H.264 decoder. 
Same secret key will be used as seed for PRNG to produce 
y. Original value of EGO suffix of |NZ| can thus be extracted 
using encrypted NZ by using the formula: 


x =(ytloge(a+1)—Y) mod logg(a+1). (3) 


4. EXPERIMENTAL RESULTS 


We have used the reference implementation of H.264 JSVM 
10.2 in AVC mode for video sequences in QCIF resolu- 
tion. For the experimental results, nine benchmark video 
sequences have been used for the analysis. Each of 
them represents different combinations of motion (fast/slow, 
pan/zoom/rotation), color (bright/dull), contrast (high/low) 
and objects (vehicle, buildings, people). The video se- 
quences *bus’, ’city’ and ’foreman’ contain camera mo- 
tion while ’football’ and ’soccer’ contain camera panning 
and zooming along with object motion and texture in back- 
ground. The video sequences *harbour’ and ’ice’ contain 
high luminance images with smooth motion. ’Mobile’ se- 
quence contains a complex still background and foreground 
motion. 
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Figure 5: Encryption process for NZs and their signs in CABAC of H.264/AVC. 


4.1 Intra Frames 


To demonstrate the efficiency of our proposed scheme, we 
have compressed 100 frames of each sequence at 30 fps as 
intra. Fig. 6 shows the encrypted video frames at different 
quantization parameter (QP) values for the foreman video 
sequence. Their PSNR values are given in Table | and they 
are compared with the PSNR obtained for the same video 
frames without encryption. One can note that, whatever is 
the QP value, the quality of the encrypted video remains in 
the same lower range (below 10 dB on average for luma). It 
verifies that our algorithm is independent of QP value. 

Table 2 compares the average PSNR of 100 frames of all 
benchmark video sequences at QP value 18 without and with 
SE. It confirms that this algorithm works well for various 
combinations of motion, texture and objects and is signifi- 
cantly efficient. Although it depends on the contents of video 
and the quantization value, yet in proportion to overall com- 
putation which a video codec consumes, it is negligible. Av- 
erage PSNR value of Juma for all the sequences at QP value 
18 is 9.73 dB. 


4.2 Intra & Inter Frames 


Video data normally consists of an intra and a trail of inter 
frames. Intra frames are inserted periodically to restrict the 
drift because of lossy compression and rounding errors. In 
this experimental evaluation, intra period is set at 10 in a 
sequence of 100 frames. Results shown in table 3 verifies 
the effectiveness of our scheme over the whole range of QP 
values for foreman. Table 4 verifies the performance of our 
algorithm for all video sequences for Intra & Inter frames. 
Average PSNR of Juma for all the sequences is 9.68 dB. 


5. CONCLUSION 


In this paper, a novel framework for SE of H.264/AVC based 
on CABAC has been presented. Real-time constraints have 
been handled successfully by having the same bit rate and 
by having compliant bitstream. The encrypted bitstream is 
H.264/AVC format compliant and can be played back by 
any standard H.264/AVC decoder. Owing to no escalation in 
bit rate, our encryption scheme is suitable for heterogeneous 
multimedia streaming scenarios in real-time environment. 


Table 1: Comparison of PSNR without encryption and with 
SE for foreman sequence at different QP values for intra 


frames. 

PSNR (Y) (dB) 

QP | Without With 
SE SE 

12 50.05 8.92 
18 44.43 8.42 
24 39.40 8.38 
30 34.93 8.92 
36 30.80 8.89 
42 27.03 8.93 


PSNR (U) (dB) 


Without 
SE 
49.99 
45.62 
41.70 
39.38 
37.33 
35.87 


With 
SE 
24.08 
23.87 
24.87 
24.60 
24.65 
24.24 


PSNR (V) (dB) 


Without 
SE 
50.78 
47.42 
43.86 
40.99 
38.10 
36.41 


With 
SE 
23.84 
22.14 
22.70 
2214 
22.90 
23.94 


Table 2: Comparison of PSNR without encryption and with 
SE of benchmark video sequences at QP 18 for intra frames. 


Seq. 


PSNR (Y) 
(dB) 


PSNR (U) 
(dB) 


PSNR (V) 
(dB) 


Orig. 


SE 


Orig. 


SE 


Orig. 


SE 


bus 

city 
crew 
football 
foreman 
harbour 
ice 
mobile 
soccer 


44.26 
44.28 
44.81 
44.59 
44.43 
44.10 
46.56 
44.45 
44.26 


TAS 
11,52 
0:39 
11.46 
8.42 
9.48 
10.37 
8.42 
10.84 


45.22 
45.83 
45.81 
45.70 
45.62 
45.60 
48.70 
44.14 
46.59 


2510 
30.50 
23.80 
15.79 
23.87 
23.82 
25.42 
13.47 
19.69 


46.50 
46.76 
45.66 
45.98 
47.42 
46.63 
49.19 
44.04 
47.82 


26.86 
31.86 
19.90 
23.10 
22.14 
31.20 
19.73 
11.11 
24.83 


Table 3: Comparison of PSNR without encryption and with 
SE for foreman sequence at different QP values for intra and 
inter frames. 


PSNR (Y) (dB) | PSNR (U) (dB) | PSNR (V) (dB) 
QP | Without With | Without With | Without With 
SE SE SE SE SE SE 
12 | 4954 841 | 4989 2334 | 50.63 22.16 
18 | 43.91 9.23 | 45.50 26.06 | 47.55 21.11 
24 | 38.90 861 | 42.04 2462) 44.29 21.83 
30 | 3459 919 | 39.84 2402] 41.56 25.18 
36 | 30.76 8.78 | 37.96 25.12 | 38.86 23.50 
42 | 2661 831 | 36.34 25.30 | 36.92 27.06 
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Figure 6: Decoding of encrypted video “foreman”: first 
frame with QP equal to (a) 12 (b) 18 (c) 24 (d) 30 (e) 36 
(f) 42. 


(<3 


The experiments have shown that we can achieve the de- 
sired level of encryption in each frame, while maintaining 
the H.264/AVC format compliance for both intra and inter, 
under a minimal set of computational requirements. Hence, 
it is perfect for multimedia playback on handheld devices. 
The proposed system can be extended for ROI specific video 
protection [11] for video surveillance and can be applied to 
medical image transmission [10]. 
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